Climate change#

In the following sections, we use Python to demonstrate how to access multiples datasets from the Atmosphere sub-catalog.

Environment setup#

[1]:
from distributed import Client
import intake
import hvplot.xarray
import hvplot.pandas
from dask.distributed import PipInstall
import dask
import xoak
import xarray as xr
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import xarray as xr
import numpy as np
import dask
from dask.diagnostics import progress
from tqdm.autonotebook import tqdm
import fsspec
import seaborn as sns

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
/tmp/ipykernel_3240/326864322.py:16: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm

We use a Dask client to ensure all following code compatible with the framework run in parallel

[2]:
client = Client()
client
[2]:

Client

Client-c21979c0-6a5d-11ed-8ca8-000d3aecfde1

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

Accessing the data#

We are now ready to access our catalog which uses Intake to organize all our datasets.

Intake is a lightweight package for finding, investigating, loading and disseminating data. A cataloging system is used to organize a collection of datasets and data loaders (drivers) are parameterized such that datasets are opened in the desired format for the end user. In the python context, multi-dimensional xarrays could be opened with xarray’s drivers while polygons (shapefiles, geojson) could be opened with geopandas.

Here is the URL from where we can open the catalog:

a) CMIP6#

In order to arrange the collection of datasets, the catalogue itself makes references to various sub-catalogs:

[3]:
col = intake.open_esm_datastore('https://storage.googleapis.com/cmip6/pangeo-cmip6.json')
col

pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):

unique
activity_id 18
institution_id 36
source_id 88
experiment_id 170
member_id 657
table_id 37
variable_id 700
grid_label 10
zstore 514818
dcpp_init_year 60
version 736
derived_variable_id 0
[4]:
col.df['variable_id'].unique()
[4]:
array(['ps', 'rsds', 'rlus', 'rlds', 'psl', 'hurs', 'huss', 'hus', 'hfss',
       'rsus', 'evspsbl', 'rsdt', 'hfls', 'rsut', 'clt', 'zg', 'ts', 'va',
       'uas', 'vas', 'tas', 'ta', 'ua', 'pr', 'tauv', 'prc', 'tauu',
       'rsutcs', 'wtem', 'vtem', 'prw', 'prsn', 'rlut', 'rlutcs',
       'tasmax', 'tasmin', 'emidust', 'emiss', 'mmrbc', 'mmrdust',
       'mmroa', 'mmrpm2p5', 'o3', 'mmrsoa', 'mmrss', 'od550lt1aer', 'oh',
       'emidms', 'mmrso4', 'cltc', 'ptp', 'airmass', 'ccb', 'cdnc', 'toz',
       'so2', 'rsutcsaf', 'wa', 'rlutcsaf', 'rlutaf', 'od870aer',
       'od550aer', 'abs550aer', 'rsutaf', 'snw', 'mrsos', 'mrso', 'mrro',
       'areacella', 'siconc', 'basin', 'mrros', 'mlotst', 'clivi', 'hur',
       'sfdsi', 'co2mass', 'rsntds', 'masso', 'soga', 'evspsblsoi', 'sos',
       'sosga', 'tauuo', 'sfcWind', 'clwvi', 'vo', 'vmo', 'uo', 'umo',
       'tosga', 'tauvo', 'tos', 'thetao', 'thetaoga', 'pbo', 'thkcello',
       'orog', 'volo', 'wfo', 'cllcalipso', 'evspsblpot', 'wap', 'mrsol',
       'so', 'wmo', 'rld', 'evspsblveg', 'lai', 'prveg', 'zos',
       'bigthetaoga', 'deptho', 'hfgeou', 'masscello', 'bigthetao', 'siv',
       'tran', 'wo', 'sisnmass', 'simass', 'siarean', 'siconca',
       'siextentn', 'sisnconc', 'sispeed', 'sisnthick', 'zostoga',
       'sitemptop', 'sithick', 'sitimefrac', 'siu', 'areacello', 'sivol',
       'siareas', 'sivols', 'gpp', 'hursmax', 'hursmin', 'sndmasssnf',
       'sivoln', 'sfcWindmax', 'rootd', 'sftgif', 'hfbasin', 'hfds',
       'areacellr', 'mrsofc', 'wap500', 'albisccp', 'ta700', 'utendepfd',
       'rldscs', 'cltisccp', 'rsuscs', 'clcalipso', 'rsdscs', 'rtmt',
       'sci', 'clhcalipso', 'sftlf', 'zg1000', 'rv850', 'utendnogw',
       'utendvtem', 'tslsi', 'tnhusc', 'dryoa', 'c3h8', 'loadbc',
       'cldnvi', 'tntr', 'tntmp', 'tntc', 'c2h6', 'tnta', 'zg500', 'tnt',
       'tnhusa', 'rsdsdiff', 'rsdscsdiff', 'loadsoa', 'loadpoa',
       'clmcalipso', 'clisccp', 'loadss', 'tnhusmp', 'tnhus', 'loaddust',
       'loadoa', 'loadso4', 'psitem', 'epfz', 'epfy', 'cltcalipso',
       'cfc12', 'sf6', 'cfc11', 'epc100', 'dpo2', 'sios',
       'treeFracNdlEvg', 'treeFracBdlEvg', 'treeFracBdlDcd', 'utendogw',
       'utendwtem', 'tntnogw', 'tntogw', 'tntrl', 'tntrlcs', 'grassFrac',
       'vegFrac', 'vtendogw', 'vtendnogw', 'mrfso', 'zmeso', 'volcello',
       'si', 'talk', 'spco2', 'talkos', 'cropFrac', 'zmesoos', 'pon',
       'snm', 'snd', 'snc', 'hfdsn', 'phymisc', 'pp', 'phyfeos',
       'phydiaz', 'phydiatos', 'phydiat', 'phycos', 'phyc', 'phos',
       'phydiazos', 'dissicos', 'limfepico', 'limfemisc', 'limfediaz',
       'limfediat', 'intpppico', 'epp100', 'epn100', 'epfe100',
       'chlmiscos', 'chldiatos', 'chl', 'calcos', 'bsios', 'bfeos',
       'baccos', 'aragos', 'tsl', 'chlpicoos', 'epcalc100', 'chlos',
       'chldiazos', 'limirrdiat', 'nbp', 'pastureFrac', 'ra', 'rh',
       'treeFrac', 'npp', 'limirrdiaz', 'limirrmisc', 'limirrpico',
       'intdic', 'o2sat', 'intdoc', 'intpbfe', 'intpbn', 'intpbp',
       'intpcalcite', 'intpn2', 'intpoc', 'intpp', 'intparag',
       'eparag100', 'intpbsi', 'graz', 'fgco2', 'limndiat', 'limnmisc',
       'limnpico', 'nh4', 'nh4os', 'no3', 'fgo2', 'no3os', 'intppnitrate',
       'intppmisc', 'intppdiaz', 'intppdiat', 'epsi100', 'expc', 'o2',
       'dissocos', 'expcalc', 'expfe', 'ph', 'pcalc', 'pbsi', 'pbfe',
       'parag', 'fescav', 'fediss', 'expsi', 'phynos', 'phypico',
       'phypicoos', 'phypos', 'physios', 'po4', 'ponos', 'zmicro',
       'zmicroos', 'zooc', 'expn', 'expp', 'po4os', 'exparag', 'zoocos',
       'arag', 'o2satos', 'o2os', 'co3', 'co3os', 'co3satarag',
       'co3sataragos', 'co3satcalc', 'detocos', 'dfe', 'dfeos', 'dissic',
       'dissoc', 'co3satcalcos', 'dpco2', 'chlmisc', 'bacc', 'bfe', 'bsi',
       'calc', 'chldiat', 'chlpico', 'detoc', 'chldiaz', 'phyfe', 'phyn',
       'phyp', 'cct', 'cfc113global', 'cfc11global', 'cfc12global',
       'ch4global', 'ci', 'cli', 'clw', 'cl', 'physi', 'pnitrate', 'pop',
       'ppdiat', 'ppmisc', 'pppico', 'ppdiaz', 'n2oglobal', 'phymiscos',
       'hus850', 'ta500', 'co23D', 'co2s', 'cProduct', 'cVeg',
       'hcfc22global', 'cLand', 'concdust', 'sconcss', 'sconcso4',
       'sconcdust', 'rhLut', 'raLut', 'nppLut', 'nep', 'vegHeight',
       'gppLut', 'laiLut', 'tntrs', 'tatp', 'emico', 'emiisop', 'emilnox',
       'eminh3', 'emibc', 'lossn2o', 'emibvoc', 'lossco', 'lossch4',
       'jno2', 'isop', 'hcl', 'hcho', 'emiso2', 'emioa', 'eminox', 'hno3',
       'rsd', 'rsdcs', 'n2o', 'dryso4', 'rsu', 'rsucs', 'ch3coch3', 'ch4',
       'cheaqpso4', 'nh50', 'no', 'mmrpm10', 'mmrno3', 'mmrnh4',
       'od550bc', 'od550dust', 'mmrpm1', 'no2', 'o3loss', 'od440aer',
       'o3ste', 'o3prod', 'dryso2', 'dryo3', 'drynoy', 'drynh4', 'drynh3',
       'dryss', 'drydust', 'dms', 'co2', 'co', 'chepsoa', 'chegpso4',
       'drybc', 'lwp', 'reffclwtop', 'photo1d', 'pan', 'od550ss',
       'od550soa', 'od550so4', 'od550oa', 'od550no3', 'tntrscs', 'c3h6',
       'bldep', 'aoanh', 'tntscp', 'cSoil', 'msftyz', 'cLitter', 'fsfe',
       'dcalc', 'bddtdip', 'bddtdife', 'bddtdic', 'bddtalk', 'remoc',
       'fFireNat', 'fLuc', 'fProductDecomp', 'fracLut', 'hflsLut',
       'hfssLut', 'hussLut', 'fAnthDisturb', 'ec550aer', 'mrlso', 'mrsfl',
       'mc', 'wetdust', 'wetnh3', 'wetnh4', 'wetnoy', 'wetoa', 'wetso2',
       'wetso4', 'wetss', 'ztp', 'wetbc', 'tropoz', 'meanage', 'mrsll',
       'tasLut', 'mrroLut', 'tslsiLut', 'mrsoLut', 'tnhuspbl', 'tnhusscp',
       'tntpbl', 'mrsosLut', 'chepasoa', 'netAtmosLandCO2Flux', 'rls',
       'rlusLut', 'rss', 'rsusLut', 'mfo', 'edt', 'evu', 'emiaoa',
       'emiso4', 'somint', 'rsdoabsorb', 'osalttend', 'osaltrmadvect',
       'hfcorr', 'osaltpsmadvect', 'osaltpmdiff', 'opottemppmdiff',
       'opottemppadvect', 'opottempmint', 'opottempdiff',
       'opottemppsmadvect', 'opottemprmadvect', 'opottemptend',
       'osaltpadvect', 'osaltdiff', 'prthetao', 'pathetao', 'wfcorr',
       'tob', 'zossq', 'tossq', 'sossq', 'omldamax', 'friver', 'evs',
       'sftof', 'mlotstmax', 'msftyrho', 'pso', 'msftyzsmpa', 'mlotstmin',
       'sob', 'parasolRefl', 'clwmodis', 'cltmodis', 'clmisr', 'climodis',
       'clcalipsoice', 'clcalipsoliq', 'sfpm25', 'sfo3', 'sfno2',
       'cVegLut', 'cSoilLut', 'rsdo', 'pfull', 'phalf', 'icfriver',
       'fHarvest', 'fFire', 'fGrazing', 'mlotstsq', 'tnpeo', 'zo2min',
       'difvmto', 'tnkebto', 'dispkevfo', 'difvtrto', 'difvso', 'difvmo',
       'difvho', 'diftrxylo', 'dispkexyfo', 'diftrelo', 'sftgrf',
       'diftrblo', 'fsn', 'fsitherm', 'agessc', 'prra', 'o2min',
       'msftbarot', 'froc', 'fric', 'ficeberg', 'fbddtdip', 'fbddtdin',
       'fbddtdife', 'fbddtdic', 'fbddtalk', 'fbddtdisi', 'rlu',
       'baresoilFrac', 'clayfrac', 'ksat', 'sandfrac', 'slthick', 'tdps',
       'prhmax', 'hcont300', 'ocontempdiff', 'ocontemppadvect',
       'ocontemppmdiff', 'ocontemprmadvect', 'ocontemptend', 'cLitterLut',
       'cSoilSlow', 'cProductLut', 'cSoilFast', 'residualFrac',
       'cSoilMedium', 'cRoot', 'cLeaf', 'cLitterAbove', 'cLitterBelow',
       'hfy', 'obvfsq', 'ta850', 'cod', 'ttop', 'msftyzmpa', 'vsfsit',
       'wilt', 'siltfrac', 'fldcapacity', 'simpmass', 'simpconc', 'dmso',
       'ocfriver', 'hfx', 'burntFractionAll', 'zfull', 'shrubFrac',
       'hfgeoubed', 'lithk', 'topg', 'sbl', 'msftmzmpa', 'cCwd',
       'tasminCrop', 'tasmaxCrop', 'msftmzsmpa', 'zsatarag', 'spco2abio',
       'bddtdin', 'bddtdisi', 'fddtalk', 'dpco2nat', 'vsf', 'phabioos',
       'phnatos', 'ppos', 'rlntds', 'fddtdisi', 'fgco2abio', 'fgco2nat',
       'frfe', 'frn', 'dissicnatos', 'fddtdic', 'dpco2abio', 'fddtdife',
       'fddtdin', 'fddtdip', 'dissicnat', 'dissicabio', 'msftmz', 'tsn',
       'talknat', 'dissi14cabio', 'fg14co2abio', 'spco2nat', 'co3nat',
       'phnat', 'fco2nat', 'fracInLut', 'fracOutLut', 'chlcalcos',
       'chlcalc', 'phycalc', 'msftmrho', 'sidmassmeltbot', 'sidmasssi',
       'sifllwutop', 'siforcecorioly', 'sidmassth', 'siflcondtop',
       'sidmassdyn', 'siforcecoriolx', 'siforceintstry',
       'ocontemppsmadvect', 'difmxylo', 'msftyrhompa', 'msftmrhompa',
       'phabio', 'co3abio', 'difmxybo'], dtype=object)
[5]:
col.df
[5]:
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1 Amon ps gn gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... NaN 20170706
1 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1 Amon rsds gn gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... NaN 20170706
2 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1 Amon rlus gn gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... NaN 20170706
3 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1 Amon rlds gn gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... NaN 20170706
4 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1 Amon psl gn gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... NaN 20170706
... ... ... ... ... ... ... ... ... ... ... ...
514813 CMIP EC-Earth-Consortium EC-Earth3-Veg historical r1i1p1f1 Amon tas gr gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN 20211207
514814 CMIP EC-Earth-Consortium EC-Earth3-Veg historical r1i1p1f1 Amon tauu gr gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN 20211207
514815 CMIP EC-Earth-Consortium EC-Earth3-Veg historical r1i1p1f1 Amon hur gr gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN 20211207
514816 CMIP EC-Earth-Consortium EC-Earth3-Veg historical r1i1p1f1 Amon hus gr gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN 20211207
514817 CMIP EC-Earth-Consortium EC-Earth3-Veg historical r1i1p1f1 Amon tauv gr gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN 20211207

514818 rows × 11 columns

The following examples will focus on datasets from the Atmosphere sub-catalog.

Even though our catalogue is constantly expanding, some datasets are already available. The next sections contain several examples of queries as well as analyses of various ones.

The current Atmosphere catalog is presented below in a table format. A dataset should be used after consulting the status field. If a dataset has a “dev” flag, it signifies that we are actively working on it and do not recommend using it. It is production-ready if it has a “prod” flag. The “prod” label signifies that the dataset has undergone quality review and testing, however users should always double-check on their own because errors are still possible.

[6]:
# load a few models to illustrate the problem
query = dict(experiment_id=["ssp585"],
             variable_id="tasmax",
             grid_label="gn",
             table_id='Amon',
             member_id='r1i1p1f1'
            )
cat = col.search(**query)

xarray_kwargs = {'consolidated': True, 'decode_times':False}

with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    dset_dict = cat.to_dataset_dict(xarray_open_kwargs=xarray_kwargs)

--> The keys in the returned dictionary of datasets are constructed as follows:
        'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
100.00% [12/12 00:07<00:00]
[7]:
[key for key in dset_dict.keys()]

[7]:
['ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Amon.gn',
 'ScenarioMIP.NUIST.NESM3.ssp585.Amon.gn',
 'ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.Amon.gn',
 'ScenarioMIP.CCCma.CanESM5.ssp585.Amon.gn',
 'ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.Amon.gn',
 'ScenarioMIP.MRI.MRI-ESM2-0.ssp585.Amon.gn',
 'ScenarioMIP.NCAR.CESM2-WACCM.ssp585.Amon.gn',
 'ScenarioMIP.FIO-QLNM.FIO-ESM-2-0.ssp585.Amon.gn',
 'ScenarioMIP.MIROC.MIROC6.ssp585.Amon.gn',
 'ScenarioMIP.BCC.BCC-CSM2-MR.ssp585.Amon.gn',
 'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp585.Amon.gn',
 'ScenarioMIP.CAS.FGOALS-g3.ssp585.Amon.gn']
[8]:
ds = dset_dict[list(dset_dict.keys())[0]]
ds
[8]:
<xarray.Dataset>
Dimensions:         (lat: 96, bnds: 2, lon: 192, member_id: 1,
                     dcpp_init_year: 1, time: 1032)
Coordinates:
    height          float64 ...
  * lat             (lat) float64 -88.57 -86.72 -84.86 ... 84.86 86.72 88.57
    lat_bnds        (lat, bnds) float64 dask.array<chunksize=(96, 2), meta=np.ndarray>
  * lon             (lon) float64 0.0 1.875 3.75 5.625 ... 354.4 356.2 358.1
    lon_bnds        (lon, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray>
  * time            (time) int64 0 708 1416 2148 ... 750924 751656 752388 753120
    time_bnds       (time, bnds) float64 dask.array<chunksize=(1032, 2), meta=np.ndarray>
  * member_id       (member_id) object 'r1i1p1f1'
  * dcpp_init_year  (dcpp_init_year) float64 nan
Dimensions without coordinates: bnds
Data variables:
    tasmax          (member_id, dcpp_init_year, time, lat, lon) float32 dask.array<chunksize=(1, 1, 516, 96, 192), meta=np.ndarray>
Attributes: (12/63)
    Conventions:                      CF-1.7 CMIP-6.2
    activity_id:                      ScenarioMIP
    branch_method:                    standard
    branch_time_in_child:             60265.0
    branch_time_in_parent:            60265.0
    cmor_version:                     3.5.0
    ...                               ...
    intake_esm_attrs:variable_id:     tasmax
    intake_esm_attrs:grid_label:      gn
    intake_esm_attrs:zstore:          gs://cmip6/CMIP6/ScenarioMIP/MPI-M/MPI-...
    intake_esm_attrs:version:         20190710
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585....
[9]:
fig, axarr = plt.subplots(nrows=4, ncols=3, figsize=[30,20])
for ax,(k, ds) in zip(axarr.flat,dset_dict.items()):
    if 'member_id' in ds.dims:
        ds = ds.isel(member_id=0)
    ds.coords['lon'] = (ds.coords['lon'] + 180) % 360 - 180
    ds = ds.sortby(ds.lon)
    da = ds.tasmax.isel(time=0).squeeze().plot.pcolormesh(ax=ax, cmap='coolwarm')
    ax.set_title(k)
../../_images/notebooks_ipynb_climate_change_17_0.png
[10]:
fig, axarr = plt.subplots(nrows=4, ncols=3, figsize=[30,20])
for ax,(k, ds) in zip(axarr.flat,dset_dict.items()):
    if 'member_id' in ds.dims:
        ds = ds.isel(member_id=0)
    da = ds.tasmax.sel(lon=280, lat=45, method='nearest').squeeze().plot(ax=ax, color='blue')
    ax.set_title(k)
../../_images/notebooks_ipynb_climate_change_18_0.png

ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate covering the period from January 1950 to present. ERA5 is produced by the Copernicus Climate Change Service (C3S) at ECMWF.

Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product.

Property

Values

Temporal extent:

01/01/1979 – 12/31/2020

Spatial extent:

World : [-180, 180, -90, 90]

Chunks (timeseries’s version):

{‘time’: 14880, ‘longitude’: 15, ‘latitude’: 15}

Chunks (spatial’s version):

{‘time’: 24, ‘longitude’: 1440, ‘latitude’: 721}

Spatial resolution:

0.25 degrees

Spatial reference:

WGS84 (EPSG:4326)

Temporal resolution:

1 hour

Update frequency:

In 2023, we will update it weekly

[11]:
[eid for eid in col.df['experiment_id'].unique() if 'ssp' in eid]

[11]:
['ssp585',
 'ssp245',
 'ssp370SST-lowCH4',
 'ssp370-lowNTCF',
 'ssp370SST-lowNTCF',
 'ssp370SST-ssp126Lu',
 'ssp370SST',
 'ssp370pdSST',
 'ssp119',
 'ssp370',
 'esm-ssp585-ssp126Lu',
 'ssp126-ssp370Lu',
 'ssp370-ssp126Lu',
 'ssp126',
 'esm-ssp585',
 'ssp245-GHG',
 'ssp245-nat',
 'ssp460',
 'ssp434',
 'ssp534-over',
 'ssp245-stratO3',
 'ssp245-aer',
 'ssp245-cov-modgreen',
 'ssp245-cov-fossil',
 'ssp245-cov-strgreen',
 'ssp245-covid',
 'ssp585-bgc']
[12]:
# there is currently a significant amount of data for these runs
expts = ['historical', 'ssp245', 'ssp585']

query = dict(
    experiment_id=expts,
    table_id='Amon',
    variable_id=['tas'],
    member_id = 'r1i1p1f1',
)

col_subset = col.search(require_all_on=["source_id"], **query)
col_subset.df.groupby("source_id")[
    ["experiment_id", "variable_id", "table_id"]
].nunique()
/usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/intake_esm/_search.py:80: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
  for _, group in grouped:
[12]:
experiment_id variable_id table_id
source_id
ACCESS-CM2 3 1 1
AWI-CM-1-1-MR 3 1 1
BCC-CSM2-MR 3 1 1
CAMS-CSM1-0 3 1 1
CAS-ESM2-0 3 1 1
CESM2-WACCM 3 1 1
CIESM 3 1 1
CMCC-CM2-SR5 3 1 1
CMCC-ESM2 3 1 1
CanESM5 3 1 1
E3SM-1-1 3 1 1
EC-Earth3 3 1 1
EC-Earth3-CC 3 1 1
EC-Earth3-Veg 3 1 1
EC-Earth3-Veg-LR 3 1 1
FGOALS-f3-L 3 1 1
FGOALS-g3 3 1 1
FIO-ESM-2-0 3 1 1
GFDL-CM4 3 1 1
GFDL-ESM4 3 1 1
IITM-ESM 3 1 1
INM-CM4-8 3 1 1
INM-CM5-0 3 1 1
IPSL-CM6A-LR 3 1 1
KACE-1-0-G 3 1 1
KIOST-ESM 3 1 1
MIROC6 3 1 1
MPI-ESM1-2-HR 3 1 1
MPI-ESM1-2-LR 3 1 1
MRI-ESM2-0 3 1 1
NESM3 3 1 1
NorESM2-LM 3 1 1
NorESM2-MM 3 1 1
TaiESM1 3 1 1
[13]:
def drop_all_bounds(ds):
    drop_vars = [vname for vname in ds.coords
                 if (('_bounds') in vname ) or ('_bnds') in vname]
    return ds.drop(drop_vars)

def open_dset(df):
    assert len(df) == 1
    ds = xr.open_zarr(fsspec.get_mapper(df.zstore.values[0]), consolidated=True)
    return drop_all_bounds(ds)

def open_delayed(df):
    return dask.delayed(open_dset)(df)

from collections import defaultdict
dsets = defaultdict(dict)

for group, df in col_subset.df.groupby(by=['source_id', 'experiment_id']):
    dsets[group[0]][group[1]] = open_delayed(df)
[14]:
dsets_ = dask.compute(dict(dsets))[0]

[15]:
# calculate global means

def get_lat_name(ds):
    for lat_name in ['lat', 'latitude']:
        if lat_name in ds.coords:
            return lat_name
    raise RuntimeError("Couldn't find a latitude coordinate")

def global_mean(ds):
    lat = ds[get_lat_name(ds)]
    weight = np.cos(np.deg2rad(lat))
    weight /= weight.mean()
    other_dims = set(ds.dims) - {'time'}
    return (ds * weight).mean(other_dims)
[16]:
expt_da = xr.DataArray(expts, dims='experiment_id', name='experiment_id',
                       coords={'experiment_id': expts})

dsets_aligned = {}

for k, v in tqdm(dsets_.items()):
    expt_dsets = v.values()
    if any([d is None for d in expt_dsets]):
        print(f"Missing experiment for {k}")
        continue

    for ds in expt_dsets:
        ds.coords['year'] = ds.time.dt.year

    # workaround for
    # https://github.com/pydata/xarray/issues/2237#issuecomment-620961663
    dsets_ann_mean = [v[expt].pipe(global_mean)
                             .swap_dims({'time': 'year'})
                             .drop('time')
                             .coarsen(year=12).mean()
                      for expt in expts]

    # align everything with the 4xCO2 experiment
    dsets_aligned[k] = xr.concat(dsets_ann_mean, join='outer',
                                 dim=expt_da)
100%|██████████| 34/34 [00:13<00:00,  2.46it/s]
[17]:
with progress.ProgressBar():
    dsets_aligned_ = dask.compute(dsets_aligned)[0]

We can quickly choose data subsets in both space and time using xarray. Here, we choose July 19–20, 1996, a period when Quebec saw historically extreme precipitation (Canada). The graphic package hvplot can then be used to track the storm throughout the event.

[18]:
source_ids = list(dsets_aligned_.keys())
source_da = xr.DataArray(source_ids, dims='source_id', name='source_id',
                         coords={'source_id': source_ids})

big_ds = xr.concat([ds.reset_coords(drop=True)
                    for ds in dsets_aligned_.values()],
                    dim=source_da)

big_ds
[18]:
<xarray.Dataset>
Dimensions:        (year: 451, experiment_id: 3, source_id: 34)
Coordinates:
  * year           (year) float64 1.85e+03 1.851e+03 ... 2.299e+03 2.3e+03
  * experiment_id  (experiment_id) <U10 'historical' 'ssp245' 'ssp585'
  * source_id      (source_id) <U16 'ACCESS-CM2' 'AWI-CM-1-1-MR' ... 'TaiESM1'
Data variables:
    tas            (source_id, experiment_id, year) float64 287.0 287.0 ... nan
[ ]:

[19]:
df_all = big_ds.sel(year=slice(1900, 2100)).to_dataframe().reset_index()
df_all.head()
[19]:
year experiment_id source_id tas
0 1900.0 historical ACCESS-CM2 287.019917
1 1900.0 historical AWI-CM-1-1-MR 286.958154
2 1900.0 historical BCC-CSM2-MR 287.996260
3 1900.0 historical CAMS-CSM1-0 287.084974
4 1900.0 historical CAS-ESM2-0 287.263682
[20]:
sns.relplot(data=df_all,
            x="year", y="tas", hue='experiment_id',
            kind="line", errorbar='sd', aspect=2);
../../_images/notebooks_ipynb_climate_change_32_0.png

b) Cordex-NA#

[21]:
col = intake.open_esm_datastore('https://ncar-na-cordex.s3-us-west-2.amazonaws.com/catalogs/aws-na-cordex.json')
col

aws-na-cordex catalog with 330 dataset(s) from 330 asset(s):

unique
variable 15
standard_name 10
long_name 18
units 10
spatial_domain 1
grid 2
spatial_resolution 2
scenario 6
start_time 3
end_time 4
frequency 1
vertical_levels 1
bias_correction 3
na-cordex-models 9
path 330
derived_variable 0
[22]:
# Show the first few lines of the catalog
col.df
[22]:
variable standard_name long_name units spatial_domain grid spatial_resolution scenario start_time end_time frequency vertical_levels bias_correction na-cordex-models path
0 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-22i 0.25 deg eval 1979-01-01T12:00:00 2014-12-31T12:00:00 day 1 raw ['ERA-Int.CRCM5-UQAM', 'ERA-Int.CRCM5-OUR', 'E... s3://ncar-na-cordex/day/hurs.eval.day.NAM-22i....
1 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-44i 0.50 deg eval 1979-01-01T12:00:00 2015-12-31T12:00:00 day 1 raw ['ERA-Int.CRCM5-UQAM', 'ERA-Int.RegCM4', 'ERA-... s3://ncar-na-cordex/day/hurs.eval.day.NAM-44i....
2 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-22i 0.25 deg hist-rcp45 1949-01-01T12:00:00 2100-12-31T12:00:00 day 1 mbcn-Daymet ['CanESM2.CanRCM4'] s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
3 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-22i 0.25 deg hist-rcp45 1949-01-01T12:00:00 2100-12-31T12:00:00 day 1 mbcn-gridMET ['CanESM2.CanRCM4'] s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
4 hurs relative_humidity Near-Surface Relative Humidity % north_america NAM-22i 0.25 deg hist-rcp45 1949-01-01T12:00:00 2100-12-31T12:00:00 day 1 raw ['GFDL-ESM2M.CRCM5-OUR', 'CanESM2.CRCM5-OUR', ... s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
325 vas northward_wind Northward Near-Surface Wind m s-1 north_america NAM-44i 0.50 deg rcp45 2006-01-01T12:00:00 2100-12-31T12:00:00 day 1 raw ['MPI-ESM-LR.CRCM5-UQAM', 'CanESM2.CRCM5-UQAM'... s3://ncar-na-cordex/day/vas.rcp45.day.NAM-44i....
326 vas northward_wind Northward Near-Surface Wind (Bias-Adjusted) m s-1 north_america NAM-22i 0.25 deg rcp85 2006-01-01T12:00:00 2100-12-31T12:00:00 day 1 mbcn-gridMET ['MPI-ESM-MR.CRCM5-UQAM', 'GEMatm-Can.CRCM5-UQ... s3://ncar-na-cordex/day/vas.rcp85.day.NAM-22i....
327 vas northward_wind Northward Near-Surface Wind m s-1 north_america NAM-22i 0.25 deg rcp85 2006-01-01T12:00:00 2100-12-31T12:00:00 day 1 raw ['MPI-ESM-MR.CRCM5-UQAM', 'GEMatm-Can.CRCM5-UQ... s3://ncar-na-cordex/day/vas.rcp85.day.NAM-22i....
328 vas northward_wind Northward Near-Surface Wind (Bias-Adjusted) m s-1 north_america NAM-44i 0.50 deg rcp85 2006-01-01T12:00:00 2100-12-31T12:00:00 day 1 mbcn-gridMET ['MPI-ESM-MR.CRCM5-UQAM', 'GEMatm-Can.CRCM5-UQ... s3://ncar-na-cordex/day/vas.rcp85.day.NAM-44i....
329 vas northward_wind Northward Near-Surface Wind m s-1 north_america NAM-44i 0.50 deg rcp85 2006-01-01T12:00:00 2100-12-31T12:00:00 day 1 raw ['MPI-ESM-MR.CRCM5-UQAM', 'GEMatm-Can.CRCM5-UQ... s3://ncar-na-cordex/day/vas.rcp85.day.NAM-44i....

330 rows × 15 columns

[23]:
data_var = 'tmax'

col_subset = col.search(
    variable=data_var,
    grid="NAM-44i",
    bias_correction="raw",
    scenario='rcp45'
)

col_subset

aws-na-cordex catalog with 1 dataset(s) from 1 asset(s):

unique
variable 1
standard_name 1
long_name 1
units 1
spatial_domain 1
grid 1
spatial_resolution 1
scenario 1
start_time 1
end_time 1
frequency 1
vertical_levels 1
bias_correction 1
na-cordex-models 1
path 1
derived_variable 0
[24]:
col_subset.df

[24]:
variable standard_name long_name units spatial_domain grid spatial_resolution scenario start_time end_time frequency vertical_levels bias_correction na-cordex-models path
0 tmax air_temperature Daily Maximum Near-Surface Air Temperature degC north_america NAM-44i 0.50 deg rcp45 2006-01-01T12:00:00 2100-12-31T12:00:00 day 1 raw ['MPI-ESM-LR.CRCM5-UQAM', 'CanESM2.CRCM5-UQAM'... s3://ncar-na-cordex/day/tmax.rcp45.day.NAM-44i...
[25]:
# Load catalog entries for subset into a dictionary of xarray datasets, and open the first one.
dsets = col_subset.to_dataset_dict(
    xarray_open_kwargs={"consolidated": True}, storage_options={"anon": True}
)
print(f"\nDataset dictionary keys:\n {dsets.keys()}")

# Load the first dataset and display a summary.
dataset_key = list(dsets.keys())[0]
store_name = dataset_key + ".zarr"

ds = dsets[dataset_key]
ds

# Note that the summary includes a 'member_id' coordinate, which is a renaming of the
# 'na-cordex-models' column in the catalog.

--> The keys in the returned dictionary of datasets are constructed as follows:
        'variable.frequency.scenario.grid.bias_correction'
100.00% [1/1 00:01<00:00]

Dataset dictionary keys:
 dict_keys(['tmax.day.rcp45.NAM-44i.raw'])
[25]:
<xarray.Dataset>
Dimensions:    (lat: 129, lon: 300, member_id: 6, time: 34698, bnds: 2)
Coordinates:
  * lat        (lat) float64 12.25 12.75 13.25 13.75 ... 74.75 75.25 75.75 76.25
  * lon        (lon) float64 -171.8 -171.2 -170.8 ... -23.25 -22.75 -22.25
  * member_id  (member_id) <U21 'MPI-ESM-LR.CRCM5-UQAM' ... 'CanESM2.CanRCM4'
  * time       (time) datetime64[ns] 2006-01-01T12:00:00 ... 2100-12-31T12:00:00
    time_bnds  (time, bnds) datetime64[ns] dask.array<chunksize=(17349, 2), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables:
    tmax       (member_id, time, lat, lon) float32 dask.array<chunksize=(4, 1000, 65, 150), meta=np.ndarray>
Attributes: (12/41)
    CORDEX_domain:                        NAM-44
    contact:                              {"MPI-ESM-LR.CRCM5-UQAM": "Winger.K...
    creation_date:                        {"MPI-ESM-LR.CRCM5-UQAM": "2012-09-...
    driving_experiment:                   {"MPI-ESM-LR.CRCM5-UQAM": "MPI-M-MP...
    driving_experiment_name:              rcp45
    driving_model_ensemble_member:        {"MPI-ESM-LR.CRCM5-UQAM": "r1i1p1",...
    ...                                   ...
    intake_esm_attrs:vertical_levels:     1
    intake_esm_attrs:bias_correction:     raw
    intake_esm_attrs:na-cordex-models:    ['MPI-ESM-LR.CRCM5-UQAM', 'CanESM2....
    intake_esm_attrs:path:                s3://ncar-na-cordex/day/tmax.rcp45....
    intake_esm_attrs:_data_format_:       zarr
    intake_esm_dataset_key:               tmax.day.rcp45.NAM-44i.raw
[26]:
ds.tmax \
.sel(lat=45, lon=-75, method='nearest') \
.hvplot(x='time',by='member_id', width=750, height=500, grid=True) \
.opts(legend_position='bottom')
[26]: